Tractable Group Detection on Large Link Data Sets

نویسندگان

  • Jeremy Kubica
  • Andrew W. Moore
  • Jeff G. Schneider
چکیده

Discovering underlying structure from co-occurrence data is an important task in a variety of fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. Previously Kubica et. al. presented the group detection algorithm (GDA) an algorithm for finding underlying groupings of entities from co-occurrence data. This algorithm is based on a probabilistic generative model and produces coherent groups that are consistent with prior knowledge. Unfortunately, the optimization used in GDA is slow, potentially making it infeasible for many large data sets. To this end, we present k-groups an algorithm that uses an approach similar to that of k-means to significantly accelerate the discovery of groups while retaining GDA’s probabilistic model. We compare the performance of GDA and k-groups on a variety of data, showing that k-groups’ sacrifice in solution quality is significantly offset by its increase in speed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

K-groups : tractable group detection on large link data sets

Discovering underlying structure from co-occurrence data is an important task in many fields, including: insurance, intelligence, criminal investigation, epidemiology, human resources, and marketing. For example a store may wish to identify underlying sets of items purchased together or a human resources department may wish to identify groups of employees that collaborate with each other. Previ...

متن کامل

Application of Recursive Least Squares to Efficient Blunder Detection in Linear Models

In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...

متن کامل

A Flexible Link Radar Control Based on Type-2 Fuzzy Systems

An adaptive neuro fuzzy inference system based on interval Gaussian type-2 fuzzy sets in the antecedent part and Gaussian type-1 fuzzy sets as coefficients of linear combination of input variables in the consequent part is presented in this paper. The capability of the proposed method (we named ANFIS2) for function approximation and dynamical system identification is remarkable. The structure o...

متن کامل

Effectiveness of spectral data reduction in detection of salt-affected soils in a small study area

     Data reduction is used to aggregate or amalgamate the large data sets into smaller and manageable information pieces in order to fast and accurate classification of different attributes. However, excessive spatial or spectral data reduction may result in losing or masking important radiometric information. Therefore, we conducted this research to evaluate the effectiveness of the different...

متن کامل

FDG-PET/MRI fused data sets for the detection of liver metastases in patients undergoing systemic anticancer treatment

Background: To retrospectively describe imaging characteristics of liver metastases on fused FDG-PET/ MRI data sets and to compare the diagnostic accuracy of MRI and fused FDG-PET/MRI data sets for the detection of liver metastases in patients undergoing systemic anticancer treatment. Materials and Methods: 43 oncological patients (mean age: 56+/- 11 years) were investigated by FDG-PET...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003